** Read and merge pfaf4 level dams data and some other pfaf4 level charactertistics

* Also reads files from MS-Basic code interpretting upstream-downstream relationships 
* Creates Stata dataset version of this info then merges GRandD data to create upstream dam info
* Also merges with country-pfaf match to determine downstream country counts (for IV)

timer clear

log using read_pfaf4char, replace
timer on 1
clear



* data from GIS "spatial join" of dams to pfaf4 codes
import delimited "../data/grand_pfaf4_global.txt", clear


**# Generate dam counts

*first, gen separate vars for diff dam types
*will =0 for for missing uses

gen irrig=(use_irr=="Main" | use_irr=="Major")
gen hydro=(use_elec=="Main" | use_elec=="Major")
gen supply=(use_supp=="Main" | use_supp=="Major")
gen flood=(use_fcon=="Main" | use_fcon=="Major")


sort grand_id

*clean/prep grand variables for collapsing by means/sums
*grand missing numerics are coded to -99
replace cap_mcm=. if cap_mcm<0
replace cap_rep=. if cap_rep<0
replace area_skm=. if area_skm<0
replace dor_pc=. if dor_pc<0
replace year=. if year<0
replace dam_hgt_m=. if dam_hgt_m<0
*6.88% of dams (472) missing height, only 8 dams missing cap_mcm 

*only dams with capacity>=.1km3 are necessarily included in GRanD 
* variable to restrict attention to these
* capacity in million m3 
gen largedam=(cap_mcm>=100) & !missing(cap_mcm)
gen largeirsuhy=largedam*(irrig | supply | hydro)
gen irsuhy=(irrig | supply | hydro)

*reservoir capacity for only hydro dams
gen cap_mcm_hydro=cap_mcm*hydro


sort conpfaf4 

#delimit ;

collapse (sum) count_irr=irrig count_flood=flood count_supply=supply count_hydro=hydro 
count_large=largedam rescap=cap_mcm rescaphydro=cap_mcm_hydro resarea=area_skm height=dam_hgt_m 
(count) count_dams=grand_id , by(conpfaf4) ;


#delimit cr 

label var resarea "Reservoir area (sq km)"

* 3 dams outside of any pfaf4, drop them
drop if conpfaf4<10000

sort conpfaf4


tempfile dams
save `dams'


**# Other pfaf4 charactertics from export of Hydro1k shapefile attribute table
import delimited "../data/pfaf4char_global.txt", clear

rename level4 pfaf4
rename mean_slope_mean slope
rename max_slope_mean max_slope
rename shape_area pfaf4area
rename mean_cti_mean cti
rename mean_aspect_mea aspect
drop objectid shape_length mean_dem_mean


label var slope "Slope (mean) in subbasin"
label var max_slope "Max slope in subbasin"
label var cti "Wetness index"
label var aspect "Aspect"
label var pfaf4area "Subbasin area"

drop if pfaf4<0

sort conpfaf4


merge 1:1 conpfaf4 using `dams'

* if missing from dams data, set dams count to zero
foreach var of varlist  count_* rescap rescaphydro resarea height {
   replace `var'= 0  if _merge==1
   local nocount=substr("`var'",max(strpos("`var'","_")+1,1),.)
   *generate density_`nocount' = `var'/pfaf4area
 }


drop if _merge==2
drop _merge

gen ifdam=count_dams>0
gen ifhydro=count_hydro>0

label var count_dams "Total dams (#)"
label var count_hydro "Hydro dams (#)"
label var count_supply "Supply dams (#)"
label var count_irr "Irrigation dams (#)"
label var count_large "Large dams (#)"
label var count_flood "Flood control dams (#)"
label var ifdam "Dam" 
label var ifhydro "Hydro dam"
label var height "Dam height"
label var rescap "Total reservoir capacity (mcm)"
label var rescaphydro "Hydro reservoir capacity (mcm)"

label define contlabel 1 "AF" 2 "AS" 3 "AU" 4 "EU" 5 "NA" 6 "SA"

label value continent contlabel
label var conpfaf4 "Pfaf4, unique by continent" 


sort conpfaf4

tempfile damsplus

save `damsplus'


*Population by pfaf4 from ArcMap
import delimited "../data/pop2000_pfaf4.txt", clear

rename mean pop2000

label var pop2000 "Average population density in 2000 in pfaf4"

keep conpfaf4 pop2000

sort conpfaf4


merge 1:1 conpfaf4 using `damsplus'
drop if _merge==2
drop _merge

tempfile pfaf4

save `pfaf4'

**# Upstream basin info

foreach cont in Africa Asia Australasia Europe North_America South_America {
import excel using "../data/level4_up_basins/`cont'_Level_4UpBasins.xlsx", sheet("Sheet1") firstrow case(lower) clear
drop fid objectid
drop if level4<0
reshape long up, i(level4) j(j)
drop if up==.
tempfile `cont'_up
save ``cont'_up'
}

use `South_America_up'

append using `Africa_up' `Asia_up' `Australasia_up' `Europe_up' `North_America_up'

replace up=continent*10000+up

rename j upstream_count 

label var upstream_count "How many basins upstream"

* save this file because needed later to merge upstream droughts
save "../data/upstream_basins.dta", replace

**# Add characteristics of upstream basins

use "../data/upstream_basins.dta", clear
* change names to merge on characteristics of upstream basins, not this basin
rename conpfaf4 start_basin
rename up conpfaf4

sort conpfaf4

*merge to dam data by pfaf4 for upstream basin
merge m:1 conpfaf4 using `damsplus', keep(match)
drop _merge

sort start_basin
* save for use in closest upstream basin

tempfile up
save `up'

collapse (sum) upstream_dams=count_dams upstream_hydro=count_hydro  ///
upstream_irr=count_irr upstream_flood=count_flood upstream_supply=count_supply ///
upstream_height=height upstream_rescap=rescap upstream_area=pfaf4area ///
(max) upstream_subbasins=upstream_count, by(start_basin)

rename start_basin conpfaf4

foreach type in hydro irr flood supply {
	label var upstream_`type'  "Upstream `type' dams (#)"
}

label var upstream_dams "Upstream all dams (#)"
label var upstream_height "Total height of dams upstream"
label var upstream_area "Total upstream basin area"
label var upstream_subbasins "Count of upstream subbasins"
label var upstream_rescap "Total upstream reservoir capacity"

sort conpfaf4

tempfile all_up

save `all_up'

*just characteristics of the nearest upstream basin

use `up', clear
keep if upstream_count==1

drop conpfaf4
rename start_basin conpfaf4
foreach var in ifdam ifhydro height rescap count_dams count_hydro pfaf4area {
	local lbl : variable label `var'
	rename `var' up1_`var'
	label var up1_`var'  "Near upstream `lbl'"
}


keep conpfaf4 up1_*

sort conpfaf4

merge 1:1 conpfaf4 using `all_up'
drop _merge


merge 1:1 conpfaf4 using `pfaf4'
gen no_upstream_subbasin=(_merge==2)
drop if _merge==1
drop _merge

label var no_upstream_subbasin "Pfaf4 with no upstream basins"

tempfile own_up

save `own_up'

**# Downstream country characteristics
** used for IVs

* list of countries for each pfaf4
*result of "Union" command between pfaf4 and country shapefiles in GIS
import delimited "../data/pfaf4_country_match.txt", clear

keep conpfaf4 iso_cc
duplicates drop
sort conpfaf4

* because a pfaf4 may have multiple countries need one listing per pfaf-country pair
by conpfaf4: gen ci=_n
by conpfaf4: egen numcountry=max(ci)

reshape wide iso_cc, i(conpfaf4 numcountry) j(ci)

tempfile pfaf4_country

save `pfaf4_country'

**# Downstream basin info



foreach cont in Africa Asia Australasia Europe North_America South_America {
import excel using "../data/level4_down_basins/`cont'_Level_4DownBasins.xlsx", sheet("Sheet1") firstrow case(lower) clear
drop if start_basin<0
drop fid objectid start_basin
reshape long down, i(conpfaf4) j(j)
replace down=. if down<0 
drop if down==.
tempfile `cont'_down
save ``cont'_down'
}

append using `Africa_down' `Asia_down' `Australasia_down' `Europe_down' `North_America_down'

rename conpfaf4 start_basin

replace down=continent*10000+down

bysort start_basin: egen dncount=max(j)
drop j

* now merge downstream basins to country IDs
rename down conpfaf4
sort conpfaf4

merge m:1 conpfaf4 using `pfaf4_country'
*** need to drop basins that don't appear downstream of anyone
drop if _merge==2
drop _merge

* this data by downstream pfaf, now need to collect all downstream countries
* into single record for the start_basin
rename conpfaf4 dnpfaf4
sort start_basin

* first reshape to 1 dnbasin 1 obs format
gen ID = _n
reshape long iso_cc, i(ID) j(num)
drop if missing(iso_cc)

* get rid of same country in different downstream subbasin
duplicates drop start_basin iso_cc, force

bysort start_basin: replace num = _n
by start_basin: egen dncountry_count=max(num)

replace dncountry_count=dncountry_count-1 if dncountry_count>0

rename iso_cc dncountry_iso

drop ID numcountry dnpfaf4
reshape wide dncountry_iso, i(start_basin dncountry_count) j(num)

*** remerge with original country codes to check and add back in obs with no downbasin
rename start_basin conpfaf4
sort conpfaf4

merge 1:1 conpfaf4 using `pfaf4_country'

replace dncountry_count=0 if _merge==2 

drop _merge

label var numcountry "Num. countries in this pfaf4"
label var dncountry_count "Num. foreign downstream countries" 

rename iso_cc* country*

sort conpfaf4

tempfile dn_country
save `dn_country'

use `own_up', clear

merge 1:1 conpfaf4 using `dn_country', keepusing(dncountry_count numcountry country1 country2)

rename country1 country
label var country "Country (ISO code)"
label var country2 "Second country ISO code, if present"

drop if _merge==2
drop _merge


compress

save "../data/pfaf4_char.dta", replace

timer off 1
timer list 1


log close
